Lecture 17¶
# Standard Imports
import numpy as np
Numerical Differentiation¶
Example: Naive Computation with doubles¶
Consider the function $f(x) = e^x \sin x$ at the point $x=2.2$. We know from usual differentiation rules that $$f'(x)=e^x (\sin x+\cos x)$$ and so $f'(2.2) \approx 1.98546043105418...$ as computed below.
f(x) = exp(x)*sin(x)
f
x |--> e^x*sin(x)
df(x) = f.derivative().factor()
df
x |--> (cos(x) + sin(x))*e^x
df_evaluated = df(2.2)
df_evaluated
1.98546043105418
But suppose we didn't know what our function $f(x)$ was. We could attempt to compute the derivative using the approximation $$f'(a) \approx \frac{f(a+h)-f(a)}{h} \quad \text{for $h$ small.}$$ This is of course the slope of the tangent line as depicted below.
a = 2.2
h = 2^-3
plot(f, (x, 2, 2.5)) + \
point2d([(a,f(a))], size=20, color='black', zorder=3) + \
line2d([(a, f(a)), (a+h, f(a+h))], color='red')
But how small should we make $h$? We will consider what values we get as approximate derivatives when $h$ has the form $h=2^{-k}$.
a = 2.2
for k in range(55):
h = 2^-k
approx = ( f(a+h) - f(a) ) / h
error = df_evaluated - approx
print(f"k = {k} approx = {float(approx):>19} error = {float(error):>10.3e}")
k = 0 approx = -8.728756395083243 error = 1.071e+01 k = 1 approx = -1.8747859924561432 error = 3.860e+00 k = 2 approx = 0.37579013960446517 error = 1.610e+00 k = 3 approx = 1.2535008533969574 error = 7.320e-01 k = 4 approx = 1.6367941031508337 error = 3.487e-01 k = 5 approx = 1.8153431617279239 error = 1.701e-01 k = 6 approx = 1.9014417757815636 error = 8.402e-02 k = 7 approx = 1.943709353603822 error = 4.175e-02 k = 8 approx = 1.9646492370991382 error = 2.081e-02 k = 9 approx = 1.9750708930669134 error = 1.039e-02 k = 10 approx = 1.9802696734077472 error = 5.191e-03 k = 11 approx = 1.9828660546445462 error = 2.594e-03 k = 12 approx = 1.9841634933982277 error = 1.297e-03 k = 13 approx = 1.9848120248498162 error = 6.484e-04 k = 14 approx = 1.98513624361658 error = 3.242e-04 k = 15 approx = 1.9852983412565663 error = 1.621e-04 k = 16 approx = 1.9853793871006928 error = 8.104e-05 k = 17 approx = 1.985419909353368 error = 4.052e-05 k = 18 approx = 1.9854401701595634 error = 2.026e-05 k = 19 approx = 1.9854502999223769 error = 1.013e-05 k = 20 approx = 1.9854553658515215 error = 5.065e-06 k = 21 approx = 1.9854578990489244 error = 2.532e-06 k = 22 approx = 1.985459167510271 error = 1.264e-06 k = 23 approx = 1.9854597970843315 error = 6.340e-07 k = 24 approx = 1.985460102558136 error = 3.285e-07 k = 25 approx = 1.985460251569748 error = 1.795e-07 k = 26 approx = 1.9854602813720703 error = 1.497e-07 k = 27 approx = 1.9854604005813599 error = 3.047e-08 k = 28 approx = 1.9854602813720703 error = 1.497e-07 k = 29 approx = 1.9854607582092285 error = -3.272e-07 k = 30 approx = 1.9854612350463867 error = -8.040e-07 k = 31 approx = 1.9854583740234375 error = 2.057e-06 k = 32 approx = 1.9854621887207031 error = -1.758e-06 k = 33 approx = 1.9854660034179688 error = -5.572e-06 k = 34 approx = 1.9854583740234375 error = 2.057e-06 k = 35 approx = 1.985443115234375 error = 1.732e-05 k = 36 approx = 1.98541259765625 error = 4.783e-05 k = 37 approx = 1.9854736328125 error = -1.320e-05 k = 38 approx = 1.985595703125 error = -1.353e-04 k = 39 approx = 1.9853515625 error = 1.089e-04 k = 40 approx = 1.9853515625 error = 1.089e-04 k = 41 approx = 1.984375 error = 1.085e-03 k = 42 approx = 1.984375 error = 1.085e-03 k = 43 approx = 1.984375 error = 1.085e-03 k = 44 approx = 1.984375 error = 1.085e-03 k = 45 approx = 2.0 error = -1.454e-02 k = 46 approx = 1.9375 error = 4.796e-02 k = 47 approx = 1.875 error = 1.105e-01 k = 48 approx = 1.75 error = 2.355e-01 k = 49 approx = 1.5 error = 4.855e-01 k = 50 approx = 2.0 error = -1.454e-02 k = 51 approx = 4.0 error = -2.015e+00 k = 52 approx = 0.0 error = 1.985e+00 k = 53 approx = 0.0 error = 1.985e+00 k = 54 approx = 0.0 error = 1.985e+00
Observe that the most accurate choice of $h$ is $h=2^{-27}$, and we get the derivative correct to only about $7$ or $8$ significant digits. This translates to $25$ bits of precision, which is roughly half of the $53$ bits of precision we might expect in a float. (You can convert from number of significant digits to number of significant bits by multiplying by $\log_2 10 \approx 3.3$.)
First we should discuss why values of $k$ far away from $27$ are so bad:
- If $k$ is small, then the secant line just isn't very close to the graph.
- If $k$ is large, then we suffer from a loss of significance due to the subtraction in our formula.
Analyzing Loss of Significance
We will now analyze the error due to loss of significance. We will assume that near $a$, the function $f$ has the property that if $a$ and $a+h$ agree to $n$ significant bits, then $f(a)$ and $f(a+n)$ also agre to $n$ significant bits. This is the case for most well-written functions (but it depends on how a function is implemented on the computer).
Since $a=2.2$, the most significant bit has place value $2^1$. Since $h=2^{-k}$, the bits in place values $2^1, 2^0, \ldots, 2^{1-k}$ all agree between $a$ and $a+h$. Thus $a$ and $a+h$ agree to $k+1$ significant bits. Then by assumption $f(a)$ and $f(a+h)$ agree to $k+1$ significant bits. Thus computing $f(a+h)-f(a)$ results in a loss of $k+1$ significant bits. Since we are working with floats, we expect to have $53 - (k+1)= 52-k$ significant bits remaining. Division by $h$ will not result in any loss of precision (since $h$ is a power of two), so our estimate of $f'(a)$ should be correct to $52-k$ significant bits. Since $1 \leq f'(a)<2$, the place value $2^0$ is the most significant bit. So, we can only expect to be accurate to the bit representing place value $2^{k-51}$. Which means that our expected error should be the same in magnitude as $2^{k-52}$.
For large values of $k$, this explains the error. See the table below. Numbers are written in exponential notation, and observe the exponents roughly match.
a = 2.2
for k in range(30,55):
h = 2**-k
approx = ( f(a+h) - f(a) ) / h
error = df_evaluated - approx
expected_error = 2**(k-52)
print(f"k = {k:>2} error = {float(error):>10.3e} expected error = {float(expected_error):>10.3e}")
k = 30 error = -8.040e-07 expected error = 2.384e-07 k = 31 error = 2.057e-06 expected error = 4.768e-07 k = 32 error = -1.758e-06 expected error = 9.537e-07 k = 33 error = -5.572e-06 expected error = 1.907e-06 k = 34 error = 2.057e-06 expected error = 3.815e-06 k = 35 error = 1.732e-05 expected error = 7.629e-06 k = 36 error = 4.783e-05 expected error = 1.526e-05 k = 37 error = -1.320e-05 expected error = 3.052e-05 k = 38 error = -1.353e-04 expected error = 6.104e-05 k = 39 error = 1.089e-04 expected error = 1.221e-04 k = 40 error = 1.089e-04 expected error = 2.441e-04 k = 41 error = 1.085e-03 expected error = 4.883e-04 k = 42 error = 1.085e-03 expected error = 9.766e-04 k = 43 error = 1.085e-03 expected error = 1.953e-03 k = 44 error = 1.085e-03 expected error = 3.906e-03 k = 45 error = -1.454e-02 expected error = 7.812e-03 k = 46 error = 4.796e-02 expected error = 1.562e-02 k = 47 error = 1.105e-01 expected error = 3.125e-02 k = 48 error = 2.355e-01 expected error = 6.250e-02 k = 49 error = 4.855e-01 expected error = 1.250e-01 k = 50 error = -1.454e-02 expected error = 2.500e-01 k = 51 error = -2.015e+00 expected error = 5.000e-01 k = 52 error = 1.985e+00 expected error = 1.000e+00 k = 53 error = 1.985e+00 expected error = 2.000e+00 k = 54 error = 1.985e+00 expected error = 4.000e+00
How good is a secant approximation?¶
The key to understanding how good a secant approximation to the derivative is Taylor's Theorem in the degree one case. Assuming our function is twice differentiable on the interval with endpoints $a$ and $a+h$, Taylor's Theorem says $$f(a+h) = f(a) + f'(a)h + \frac{f''(y)}{2}h^2$$ for some $y$. Manipulating this formula, we see that $$\frac{f(a+h)-f(a)}{h} = f'(a) + \frac{f''(y)}{2}h.$$ Assuming $f''(y)$ is nearly constant on the small interval with endpoints $a$ and $a+h$, we see that our error is roughly linear in $h$.
An improvement¶
The formula above actually suggests an improvement. Here is our formula above: $$\frac{f(a+h)-f(a)}{h} = f'(a) + \frac{f''(y_+)}{2}h,$$ where we have used $y_+$ instead of $y$. Applying the formula to $-h$ as well gives a $y_-$ between $a-h$ and $a$ such that $$\frac{f(a)-f(a-h)}{h} = f'(a) - \frac{f''(y_-)}{2}h.$$ Averaging these two formulas gives that $$\frac{f(a+h)-f(a-h)}{2h} = f'(a) + \left(\frac{f''(y_+)}{4}- \frac{f''(y_-)}{4}\right)h.$$ It may not be clear why this is better. But, recall that we are assuming that $h$ is so small that $f''$ is roughly constant on the interval from $a-h$ to $a+h$. Thus means that the difference $f''(y_+)-f''(y_+)$ is very close to zero.
A better understanding of the improvement. Consider applying the degree two Taylor's theorem. It says that $$f(a+h) = f(a) + f'(a)h + \frac{f''(a)}{2}h^2 + \frac{f^{(3)}(y_+)}{6}h^3$$ for some $y_+$ between $a$ and $a+h$. Similarly we see that $$f(a-h) = f(a) - f'(a)h + \frac{f''(a)}{2}h^2 - \frac{f^{(3)}(y_-)}{6}h^3$$ for some $y_-$ between $a-h$ and $a$. Subtracting the two equations yields: $$f(a+h)-f(a-h) = 2 f'(a)h + \frac{f^{(3)}(y_+)+f^{(3)}(y_-)}{6} h^3.$$ Simplifying yields that $$f'(a) = \frac{f(a+h)-f(a-h)}{2h} - \frac{f^{(3)}(y_+)+f^{(3)}(y_-)}{12} h^2.$$ Again assuming that the third derivative is roughly constant, we see that the error term is roughly proportional to $h^2$. Quadratic ($h^2$) is much better than linear ($h$) since when $h$ is small, $h^2$ is even smaller.
The two-point approximation of the derivative.
Assuming $f$ has a continuous $3$rd derivative in a neighborhood of $a$, we have $$f'(a) = \frac{f(a+h)-f(a-h)}{2h} + {\mathcal O}(h^2) \quad \text{as $h \to 0$.}$$Above we are using Big O notation. The term ${\mathcal O}(h^2)$ can be replaced by an "error" term $E(h)$, which in this case must be $$E(h) = f'(a) - \frac{f(a+h)-f(a-h)}{2h}.$$ Big O notation is indicating that there are constants $M>0$ and a $\delta>0$ such that $$|E(h)| < M h^2 \quad \text{whenever $0<|h|<\delta$.}$$
a = 2.2
h = 2^-3
plot(f, (x, 2, 2.5)) + \
plot(lambda x: df(a)*(x-a) + f(a), (x, 2, 2.5), color='black') + \
point2d([(a,f(a))], size=20, color='black', zorder=3) + \
line2d([(a, f(a)), (a+h, f(a+h))], color='red') + \
line2d([(a-h, f(a-h)), (a+h, f(a+h))], color='orange')
Example: Testing the two-point approximation¶
We will test our approximation to the derivative using the same function $f(x) = e^x \sin x$ at the point $x=2.2$. Recall $f'(2.2) \approx 1.9854604310541828$.
We repeat the computation of the table using the new approximation:
a = 2.2
for k in range(55):
h = 2**-k
approx = ( f(a+h) - f(a-h) ) / (2*h)
error = df_evaluated - approx
print(f"k = {k:>2} approx = {float(approx):>19} error = {float(error):>10.3e}")
k = 0 approx = -2.2632720891695968 error = 4.249e+00 k = 1 approx = 0.930976959843731 error = 1.054e+00 k = 2 approx = 1.7225417638027842 error = 2.629e-01 k = 3 approx = 1.9197780919414065 error = 6.568e-02 k = 4 approx = 1.969042857954257 error = 1.642e-02 k = 5 approx = 1.9813562268478933 error = 4.104e-03 k = 6 approx = 1.984434391832508 error = 1.026e-03 k = 7 approx = 1.9852039219883295 error = 2.565e-04 k = 8 approx = 1.9853963038340225 error = 6.413e-05 k = 9 approx = 1.9854443992521738 error = 1.603e-05 k = 10 approx = 1.9854564231027325 error = 4.008e-06 k = 11 approx = 1.9854594290673049 error = 1.002e-06 k = 12 approx = 1.9854601805564016 error = 2.505e-07 k = 13 approx = 1.9854603684252652 error = 6.263e-08 k = 14 approx = 1.9854604154024855 error = 1.565e-08 k = 15 approx = 1.985460427153157 error = 3.901e-09 k = 16 approx = 1.9854604300635401 error = 9.906e-10 k = 17 approx = 1.9854604308493435 error = 2.048e-10 k = 18 approx = 1.9854604309657589 error = 8.842e-11 k = 19 approx = 1.985460430616513 error = 4.377e-10 k = 20 approx = 1.9854604313150048 error = -2.608e-10 k = 21 approx = 1.9854604313150048 error = -2.608e-10 k = 22 approx = 1.9854604322463274 error = -1.192e-09 k = 23 approx = 1.9854604303836823 error = 6.705e-10 k = 24 approx = 1.9854604229331017 error = 8.121e-09 k = 25 approx = 1.985460415482521 error = 1.557e-08 k = 26 approx = 1.9854604303836823 error = 6.705e-10 k = 27 approx = 1.9854604601860046 error = -2.913e-08 k = 28 approx = 1.9854604005813599 error = 3.047e-08 k = 29 approx = 1.9854605197906494 error = -8.874e-08 k = 30 approx = 1.9854607582092285 error = -3.272e-07 k = 31 approx = 1.9854602813720703 error = 1.497e-07 k = 32 approx = 1.9854602813720703 error = 1.497e-07 k = 33 approx = 1.9854621887207031 error = -1.758e-06 k = 34 approx = 1.9854583740234375 error = 2.057e-06 k = 35 approx = 1.9854583740234375 error = 2.057e-06 k = 36 approx = 1.985443115234375 error = 1.732e-05 k = 37 approx = 1.9854736328125 error = -1.320e-05 k = 38 approx = 1.985595703125 error = -1.353e-04 k = 39 approx = 1.9853515625 error = 1.089e-04 k = 40 approx = 1.98583984375 error = -3.794e-04 k = 41 approx = 1.9853515625 error = 1.089e-04 k = 42 approx = 1.986328125 error = -8.677e-04 k = 43 approx = 1.98046875 error = 4.992e-03 k = 44 approx = 1.9921875 error = -6.727e-03 k = 45 approx = 1.984375 error = 1.085e-03 k = 46 approx = 1.96875 error = 1.671e-02 k = 47 approx = 1.9375 error = 4.796e-02 k = 48 approx = 1.875 error = 1.105e-01 k = 49 approx = 2.0 error = -1.454e-02 k = 50 approx = 1.5 error = 4.855e-01 k = 51 approx = 3.0 error = -1.015e+00 k = 52 approx = 0.0 error = 1.985e+00 k = 53 approx = 0.0 error = 1.985e+00 k = 54 approx = 0.0 error = 1.985e+00
Above we see that the best approximation is when $k=18$ and the comptuation yeilds between $10$ and $11$ digits of precision. We get roughly $34$ bits of precision. We have lost only a third of the bits of precision with this formula.
Best choice of $h$¶
With the two-point approximation to $f'(a)$ there the same competing issues:
- Error due to bad approximation when choosing large $h$.
- Error loss of significance when choosing small $h$.
Essentially the best choice of $h$ is the one which makes the errors due to these two issues approximately equal. (To see this, note that for example if the greatest error is due to bad approximation, then we should shrink $h$ to make that error decrease. This will result in the error due to loss of significance to increase, but that error is smaller so it matters less.)
Even though we will consider the loss of significance due for the two-point approximation, the error estimate is the almost the same as before, roughly $2^{-52}\big/h$. From our error estimate for the two point method, we see that the error due to bad approximation is roughly $c h^2$ where $c \approx \frac{f^{(3)}(a)}{6}$. We'll just assume $c$ is a reasonably small number. So, we'd like to choose an $h$ such that roughly $$\frac{2^{-52}}{h} = c h^2.$$ Thus, $h^3 = \frac{2^{-52}}{c}$ and so $$h=\frac{2^{-17 \frac{1}{3}}}{\sqrt[3]{c}}.$$ Note that we are not interested in getting an exact answer here. We are happy simply to know what magnitude to choose for $h$. Assuming $c$ is a reasonable number, we should choose something on the order of $h=2^{-17 \frac{1}{3}}$. This will then result in roughly equal error loss to both bad approximation and loss of significance as desired.
The value $h=2^{-17 \frac{1}{3}}$ is pretty close to the $h=2^{-18}$ that we saw worked best when looking at the table.
Example: Using NumPy to approximate the first derivative¶
N = 11
f(x) = x^2*sin(1/x)
pts = np.linspace(0, 1, N)
pts
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
h = float(2^-17)
h
7.62939453125e-06
pts_plus = pts + h
pts_plus
array([7.62939453e-06, 1.00007629e-01, 2.00007629e-01, 3.00007629e-01, 4.00007629e-01, 5.00007629e-01, 6.00007629e-01, 7.00007629e-01, 8.00007629e-01, 9.00007629e-01, 1.00000763e+00])
pts_minus = pts - h
pts_minus
array([-7.62939453e-06, 9.99923706e-02, 1.99992371e-01, 2.99992371e-01, 3.99992371e-01, 4.99992371e-01, 5.99992371e-01, 6.99992371e-01, 7.99992371e-01, 8.99992371e-01, 9.99992371e-01])
numerical_derivative = (pts_plus^2*sin(1/pts_plus)-pts_minus^2*sin(1/pts_minus))/(2*h)
numerical_derivative
array([-7.62263328e-06, 7.30267225e-01, -6.67231894e-01, 8.67333226e-01, 1.27992133e+00, 1.32544426e+00, 1.29021310e+00, 1.24411841e+00, 1.20305303e+00, 1.16947994e+00, 1.14263966e+00])
plot(f, (x,0,1)) + \
plot(f.derivative(), (x,0,1), color='green') + \
point2d(zip(pts, numerical_derivative), color='red', size=20)
The second derivative:¶
We will now explain how to use Taylor polynomials to estimate the 2nd derivative using three points. Evaluating the degree $2$ Taylor polynomial and considering the error gives that $$f(a+h) = f(a) + f'(a)h + \frac{f''(a)}{2}h^2 + \frac{f^{(3)}(y_+)}{6}h^3$$ for some $y_+$ between $a$ and $a+h$, and $$f(a-h) = f(a) - f'(a)h + \frac{f''(a)}{2}h^2 - \frac{f^{(3)}(y_-)}{6}h^3$$ for some $y_-$ between $a-h$ and $a$.
Adding the formulas yields: $$f(a+h) + f(a-h) = 2 f(a) + f''(a) h^2 + \frac{f^{(3)}(y_+)-f^{(3)}(y_-)}{6}h^3.$$ Solving for $f''(a)$, we see that $$f''(a) = \frac{f(a+h) - 2 f(a) + f(a-h)}{h^2} - \frac{f^{(3)}(y_+)-f^{(3)}(y_-)}{6}h.$$
This formula looks like it an error of order $h$, but if $f$ is $4$-times differentiable, we can apply the Mean value theorem to conclude that $$\frac{f^{(3)}(y_+)-f^{(3)}(y_-)}{y_+-y_-} = f^{(4)}(z)$$ for some $z$ between $y_-$ and $y_+$. Thus, $$|f^{(3)}(y_+)-f^{(3)}(y_-)| = (y_+-y_-) |f^{(4)}(z)| \leq 2h |f^{(4)}(z)|.$$ Plugging in, we see that $$\left|f''(a) - \frac{f(a+h) + 2 f(a) - f(a-h)}{h^2}\right| \leq \frac{|f^{(4)}(z)|}{3} h^2.$$ for some $z \in (-h,h)$. This is summarized below:
The three-point approximation of the 2nd derivative.
Assuming $f$ has a continuous $4$th derivative in a neighborhood of $a$, we have $$f''(a) = \frac{f(a+h) - 2 f(a) + f(a-h)}{h^2} + {\mathcal O}(h^2) \quad \text{as $h \to 0$.}$$Example
Lets return to the same example $f(x) = e^x \sin x$, this time to compute $f''(2.2)$. To get an exact computation, we can use calculus. Recall $$f'(x)=e^x (\sin x+\cos x).$$ We can compute (or ask Sage to compute) that $$f''(x) = 2 e^x \cos x.$$
f(x) = exp(x)*sin(x)
df(x) = f.derivative()
d2f(x) = df.derivative()
d2f
x |--> 2*cos(x)*e^x
Here is a table computing approximates for $f''(a)$ using for values of $h=2^{-k}$. Note that the smallest error we see is about $10^{-7}$ when $k=13$ or so.
a = 2.2
for k in range(26):
h = 2**-k
approx = (f(a+h) + f(a-h) - 2*f(a)) / h**2
error = d2f(a) - approx
print(f"k = {k:>2} approx = {float(approx):>19} error = {float(error):>10.3e}")
k = 0 approx = -12.930968611827293 error = 2.309e+00 k = 1 approx = -11.223051809199497 error = 6.006e-01 k = 2 approx = -10.774012993586553 error = 1.516e-01 k = 3 approx = -10.660435816711129 error = 3.797e-02 k = 4 approx = -10.631960153709315 error = 9.499e-03 k = 5 approx = -10.62483616767895 error = 2.375e-03 k = 6 approx = -10.623054854520888 error = 5.938e-04 k = 7 approx = -10.622609506448498 error = 1.485e-04 k = 8 approx = -10.62249816826079 error = 3.711e-05 k = 9 approx = -10.622470333706588 error = 9.278e-06 k = 10 approx = -10.622463375329971 error = 2.320e-06 k = 11 approx = -10.622461639344692 error = 5.840e-07 k = 12 approx = -10.622461199760437 error = 1.444e-07 k = 13 approx = -10.622461199760437 error = 1.444e-07 k = 14 approx = -10.622460842132568 error = -2.132e-07 k = 15 approx = -10.622461318969727 error = 2.636e-07 k = 16 approx = -10.622459411621094 error = -1.644e-06 k = 17 approx = -10.622467041015625 error = 5.986e-06 k = 18 approx = -10.6224365234375 error = -2.453e-05 k = 19 approx = -10.62255859375 error = 9.754e-05 k = 20 approx = -10.623046875 error = 5.858e-04 k = 21 approx = -10.6171875 error = -5.274e-03 k = 22 approx = -10.59375 error = -2.871e-02 k = 23 approx = -10.625 error = 2.539e-03 k = 24 approx = -10.5 error = -1.225e-01 k = 25 approx = -10.0 error = -6.225e-01
We can get a better approximation using more points as indicated below.
The five-point approximation of the 2nd derivative.
Assuming $f$ has a continuous $6$th derivative in a neighborhood of $a$, we have $$f''(a) = \frac{-f(a+2h) + 16 f(a+h) - 30 f(a) + 16 f(a-h) - f(a-2h)}{12 h^2} + {\mathcal O}(h^4) \quad \text{as $h \to 0$.}$$Proof:
I will briefly explain how this is obtained. Considering Taylor's theorem for $f(a \pm h)$ we can see that $$f(a+h)+f(a-h)-2 f(a) = f''(a)h^2 + \frac{f^{(4)}(a)}{12} h^4 + \frac{f^{(5)}(y_1) - f^{(5)}(y_{-1})}{5!}h^5,$$ for some $y_1 \in (a,a+h)$ and $y_{-1} \in (a-h,a)$. Applying the Mean value theorem then tell us that $$f^{(5)}(y_1) - f^{(5)}(y_{-1})=(y_1 - y_{-1})f^{(6)}(z_1)$$ for some $z_1 \in (y_{-1}, y_1).$ Then, $$f(a+h)+f(a-h)-2 f(a) = f''(a)h^2 + \frac{f^{(4)}(a)}{12} h^4 + \frac{(y_1 - y_{-1})f^{(6)}(z_1)}{5!}h^5.$$
Then substituting $2h$ for $h$, we see that $$f(a+2h)+f(a-2h)-2 f(a) = 4 f''(a)h^2 + \frac{4 f^{(4)}(a)}{3} h^4 + \frac{32(y_2 - y_{-2})f^{(6)}(z_2)}{5!}h^5,$$ for some $y_2 \in (a,a+2h)$, $y_{-2} \in (a-2h,a)$, and $z_2 \in (y_{-2},y_2)$.
Letting $X_1=f(a+h)+f(a-h)-2 f(a)$ and $X_2 = f(a+2h)+f(a-2h)-2 f(a)$, we see that we can remove the degree four term as follows:
$$16 X_1 - X_2 = 12 f''(a)h^2 + \frac{16(y_1 - y_{-1})f^{(6)}(z_1)-32(y_2 - y_{-2})f^{(6)}(z_2)}{5!}h^5.$$
Solving for $f''(a)$, we see that
$$f''(a) = \frac{16 X_1 - X_2}{12 h^2} - \frac{(y_1 - y_{-1})f^{(6)}(z_1)-2(y_2 - y_{-2})f^{(6)}(z_2)}{90}h^3.$$
Now observe that $0
Example
Here is a table computing approximates for $f''(a)$ using the five point formula for values of $h=2^{-k}$. Note that the smallest error we see is about $10^{-10}$ when $k=9$ or so. This is a bit better than the three point fomula.
a = float(2.2)
for k in range(26):
h = 2**-k
approx = (-f(a+2*h) + 16*f(a+h) -30 *f(a) + 16*f(a-h) - f(a-2*h)) / (12*h**2)
error = d2f(a) - approx
print(f"k = {k:>2} approx = {float(approx):>19} error = {float(error):>10.3e}")
k = 0 approx = -11.201881721164792 error = 5.794e-01 k = 1 approx = -10.653746208323565 error = 3.129e-02 k = 2 approx = -10.624333388382238 error = 1.872e-03 k = 3 approx = -10.62257675775274 error = 1.157e-04 k = 4 approx = -10.622468266042384 error = 7.211e-06 k = 5 approx = -10.622461505667312 error = 4.503e-07 k = 6 approx = -10.622461083468199 error = 2.815e-08 k = 7 approx = -10.622461057072844 error = 1.750e-09 k = 8 approx = -10.622461055517002 error = 1.939e-10 k = 9 approx = -10.622461055521853 error = 1.987e-10 k = 10 approx = -10.622461055638269 error = 3.152e-10 k = 11 approx = -10.6224610566472 error = 1.324e-09 k = 12 approx = -10.62246104578177 error = -9.541e-09 k = 13 approx = -10.622461095452309 error = 4.013e-08 k = 14 approx = -10.622461080551147 error = 2.523e-08 k = 15 approx = -10.622460762659708 error = -2.927e-07 k = 16 approx = -10.622462272644043 error = 1.217e-06 k = 17 approx = -10.622458140055338 error = -2.915e-06 k = 18 approx = -10.622467041015625 error = 5.986e-06 k = 19 approx = -10.62274169921875 error = 2.806e-04 k = 20 approx = -10.622639973958332 error = 1.789e-04 k = 21 approx = -10.619466145833332 error = -2.995e-03 k = 22 approx = -10.602864583333332 error = -1.960e-02 k = 23 approx = -10.598958333333332 error = -2.350e-02 k = 24 approx = -10.791666666666666 error = 1.692e-01 k = 25 approx = -10.666666666666666 error = 4.421e-02
Of course with all these ideas we can do better if we work with a field with more bits of precision. If we double the bits:
RF = RealField(2*53)
a = RF(2.2)
for k in range(9, 30):
h = 2^-k
approx = (-f(a+2*h) + 16*f(a+h) -30 *f(a) + 16*f(a-h) - f(a-2*h)) / (12*h**2)
error = d2f(a) - approx
print(f"k = {k:>2} error = {float(error):>10.3e} approx = {approx}")
k = 9 error = 6.870e-12 approx = -10.62246105532998510129839294047 k = 10 error = 4.294e-13 approx = -10.62246105532354438026978694141 k = 11 error = 2.684e-14 approx = -10.62246105532314183550216506766 k = 12 error = 1.677e-15 approx = -10.62246105532311667645882657407 k = 13 error = 1.048e-16 approx = -10.62246105532311510401870604958 k = 14 error = 6.552e-18 approx = -10.62246105532311500574118832352 k = 15 error = 4.095e-19 approx = -10.62246105532311499959884067391 k = 16 error = 2.634e-20 approx = -10.62246105532311499921568179091 k = 17 error = 4.701e-21 approx = -10.62246105532311499919404715772 k = 18 error = 2.432e-20 approx = -10.62246105532311499921367008766 k = 19 error = 6.442e-20 approx = -10.62246105532311499925376298050 k = 20 error = 1.954e-19 approx = -10.62246105532311499938477074301 k = 21 error = 1.469e-18 approx = -10.62246105532311500065870829568 k = 22 error = 1.361e-18 approx = -10.62246105532311500055028807843 k = 23 error = 1.972e-17 approx = -10.62246105532311501890944486585 k = 24 error = 3.938e-17 approx = -10.62246105532311503856964426025 k = 25 error = 2.707e-16 approx = -10.62246105532311526986610772383 k = 26 error = 1.094e-15 approx = -10.62246105532311609328151765415 k = 27 error = 2.278e-15 approx = -10.62246105532311727751941058765 k = 28 error = -4.087e-15 approx = -10.62246105532311091224073607009 k = 29 error = 8.940e-15 approx = -10.62246105532312393885755833859
The best error is k=19
. In both cases, the best number is around $1/6$ of the total number of bits of precision.